Interleaving Strings
Let's solve the Interleaving Strings problem using Dynamic Programming.
Statement#
Given strings s1, s2, and s3, find whether an interleaving of s1 and s2 forms s3.
An interleaving of two strings a and b is a configuration where a and b splits into and substrings, respectively, such that:
- a + + … +
- b + + … +
- The interleaving can follow either of the following two formats:
- can be a consecutive sequence of characters in string
s1. - can be a consecutive sequence of characters in string
s2.
Note: is the concatenation of the strings and .
Let’s say we have two strings, “abc” and “xyz”. We may interleave them in multiple ways. We may alternately pick one character from each string until we have used up all the characters, giving us “axbycz” and “xaybzc”. Or, we may try to pick two characters at a time from each string, giving us “abxycz” and “xyabzc”. If we pick three characters at a time, we get “abcxyz” and “xyzabc”. Another class of variations is to pick a variable number of characters from each string, which leads to many more possibilities, such as “abxcyz”, “xayzbc”, and so on. However, “bxaycz” is not a valid interleaving, as the order of the characters taken from the first string is not preserved, since “b” appears here before “a”. Similarly, “acxybz” is not a valid interleaving.
Constraints:
-
s1.length,s2.length -
s3.length s1,s2, ands3consist of lowercase English letters.
Examples#
No. | s1 | s2 | s3 | Output |
1 | "xyz" | "abc" | "axbycz" | True |
2 | "abc" | "xyz" | "axbycz" | True |
3 | "abc" | "xyz" | "xaybzc" | True |
4 | "xxyzz" | "wyyzx" | "xxwyyyxzzz" | False |
5 | "" | "" | "" | True |
6 | "abd" | "cef" | "adcbef" | False |
Try it yourself#
Implement your solution in the following playground:
Note: If you clicked the “Submit” button and the code timed out, this means that your solution needs to be optimized in terms of running time.
Hint: Use dynamic programming and see the magic.
Solution#
We will first explore the naive recursive solution to this problem and then see how it can be improved using the Longest Common Substring dynamic programming pattern.
Naive approach#
A naive approach to deduce if s3 is indeed a result of interleaving s1 and s2 is to try and match the two strings with s3 one letter at a time. As an example, suppose we have the following strings:
s1= “abc”s2= “xyz”s3= “abxycz”
The first step is to verify if the length of s3 is equal to the sum of the lengths of s1 and s2. If it isn’t, return FALSE as the output. However, that isn’t the case for the given inputs. Therefore, we can traverse these strings and engage in the matching process. While traversing, we keep track of the indexes we are at in all three strings, allowing us to check the order in which these characters appear. Let’s see how it will work in this example with the help of the illustration below:
1 of 13
2 of 13
3 of 13
4 of 13
5 of 13
6 of 13
7 of 13
8 of 13
9 of 13
10 of 13
11 of 13
12 of 13
13 of 13
As we can see, we have two options at any step:
- If the letter at s1 index matches with the letter at s3 index, we consume both, that is, we move s1 index as well as s3 index one step forward. The matching process continues recursively for the portions of
s1,s2, ands3that we have not yet considered. - If the letter at s2 index matches with the letter at s3 index, we consume both, that is, we move s2 index as well as s3 index one step forward. The matching process continues recursively for the portions of
s1,s2, ands3that we have not yet considered.
This process allows us to check all possible combinations of interleaving.
Let’s implement this naive approach:
Note: Please observe that if you include the test case commented out in the driver code, the solution is likely to time out. Try to solve the larger problem using the dynamic programming solutions provided below and see the difference.
Time complexity#
The time complexity of the above algorithm is , where s and t are the lengths of s1 and s2, respectively.
Space complexity#
The space complexity of this naive approach is , where and are the lengths of s1 and s2, respectively. The solution uses the stack to store the data for each recursive call, and there may be at most calls in any given branch of the recursive call tree.
Optimized solution using dynamic programming#
As the naive approach is costly in terms of time complexity, we can try to optimize the solution using dynamic programming. Let’s see if this problem satisfies both conditions of dynamic programming:
-
Optimal substructure: When we are presented with the problem of checking whether two strings,
s1ands2have been interleaved to creates3, we compare the first character ofs1with the first character ofs3, and if they match, we check the remaining characters ofs1, the original characters ofs2, and the remaining characters ofs3. Denoting the length ofs1by , the length ofs2by and the length ofs3by , we can see that solving the problem for involves solving the smaller problem, . In case it’s the first character ofs2that matches the first character ofs3, we consume these matching characters and check the original characters ofs1, the remaining characters ofs2and the remaining characters ofs3. That is, we now need to solve the smaller problem represented by . As solving a smaller version of the overall problem helps to solve it, this problem has the optimal substructure property. -
Overlapping subproblems: The following illustration highlights the overlapping subproblems. There are two recursive calls for
s1= “b”,s2= “b”,s3= “bb” (highlighted in peach), four calls fors1= “”,s2= “”,s3= “” (highlighted in green), and so on. The function is computing certain results over and over again, which is inefficient. Therefore, this problem has the overlapping subproblems property.
Top-down solution#
The top-down solution improves on the recursive solution. Instead of repeatedly solving the same problems, we store the result of each subproblem and reuse it when needed. Let’s look at an example that would demonstrate this phenomenon:
s1= “ab”s2= “ab”s3= “aabb”
As illustrated above, the recursive solution involves solving several overlapping subproblems. However, it can be optimized with the help of memoization. In the recursive function, the indexes of s1, s2, and s3 uniquely identify each subproblem. So, we can store these subproblems in a hash table indexed by s1, s2, and s3 and look them up when required.
Let’s implement this memoization approach:
Note: Please observe that if you now uncomment the large test case in the driver code, the dynamic programming solution will still work.
Time complexity#
The time complexity of this approach is where and are the lengths of s1 and s2, respectively.
Space complexity#
The space complexity would be as we use a hash table of size to store the subproblems we solve.
Bottom-up solution#
The first step of our bottom-up solution is to check if the lengths of s1 and s2 add up to the length of s3. If this is not the case, then s3 cannot be an interleaving of s1 and s2, so we can return FALSE without proceeding further.
If we pass the initial test then we can move on to constructing our solution. The bottom-up solution starts with the initialization of our lookup table. The lookup_table is created with an extra row and column to separate the base cases from the rest of the subproblems that we will solve later. The base cases for the problem are as follows:
-
The top-left cell (
s1_index= 0 ands2_index= 0) represents the case where the interleaved strings3is empty. This cell will be TRUE if boths1ands2are empty. -
For the first column (
s1_index> 0 ands2_index= 0), each cell will be filled with TRUE as long as the prefix ofs3up to that point is equal to the current substring ofs1(s1[0]…s1[s1_index - 1]). This is because, in this case,s2is empty, so all the characters ins3must come froms1. -
For the first row (
s1_index= 0 ands2_index> 0), each cell will be filled with TRUE as long as the prefix ofs3up to that point is equal to the current substring ofs2(s2[0]…s2[s2_index - 1]). This is because, in this case,s1is empty, so all the characters ins3must come froms2.
As the base case logic outlined above indicates, each cell in the lookup table, addressed by the tuple (s1_index, s2_index), represents whether there is any way to interleave the initial s1_index characters of s1 with the initial s2_index characters of s2 to form the string represented by the initial s1_index + s2_index characters of s3.
Let’s see what we mean by this with a couple of examples:
-
Given
s1= “uvx”,s2= “wyz” ands3= “uvwxyz”,lookup_table[2][1]represents whether there is any way to interleave the initial characters froms1, “uv”, and the initial character froms2, “w”, to make “uvw”, the initial characters ins3. Since this is possible, the result of this subproblem is TRUE. -
Similarly,
lookup_table[2][2]represents whether there is any way to interleave the initial characters froms1, “uv”, and the initial characters froms2, “wy”, to make “uvwx”, the initial characters ins3. Since this is not possible, the result of this subproblem is FALSE.
Let’s walk through the process of checking whether the given strings, s1 and s2, are interleaved to form s3:
1 of 17
2 of 17
3 of 17
4 of 17
5 of 17
6 of 17
7 of 17
8 of 17
9 of 17
10 of 17
11 of 17
12 of 17
13 of 17
14 of 17
15 of 17
16 of 17
17 of 17
We traverse the remaining cells of the table from top left to bottom right. We do this in the form of a nested loop where in the outer loop, s1_index ranges from to and in the inner loop, s2_index ranges from to . This nested loop allows us to consider all possible combinations of characters from s1 and s2 that can be used to form the interleaved string s3.
For each cell, there are two possibilities for matching the current prefix of s3, expressed as s3[0:s1_index + s2_index - 1]:
-
Either we match with the current character from
s1:- The value of
lookup_table[s1_index][s2_index]is set to TRUE if the characters1[s1_index]is equal to the characters3[s1_index + s2_index - 1]and the previous characters ins1up to the indexs1_index-1have already been matched with the corresponding characters ins3(i.e., the value of the table at positionlookup_table[s1_index - 1][s2_index]is TRUE).
- The value of
-
Or, we match with the current character from
s2:- The value of
lookup_table[s1_index][s2_index]is set to TRUE if the characters2[s2_index]is equal to the characters3[s1_index + s2_index - 1]and the previous characters ins2up to the indexs2_index-1have already been matched with the corresponding characters ins3(i.e., the value of the table at positionlookup_table[s1_index][s2_index - 1]is TRUE).
- The value of
Note: Since
s1_indexands2_indexare zero-indexed, we need to subtract from the sum ofs1_indexands2_indexto get the correct index ins3. This is because the first character ins3is at index , not index .
If either one of the two conditions above is TRUE, the result of the current subproblem is TRUE, and FALSE otherwise.
The above operations are repeated until we have traversed the table entirely.
After the table has been completely filled, the bottom-right cell contains TRUE if an interleaving of s1 and s2 forms s3. Otherwise, it will contain FALSE. Hence, we return the value of the bottom-right cell as the final result.
Let's implement this algorithm:
Note: Please observe that if you now uncomment the large test case in the driver code, the dynamic programming solution will still work.
Time complexity#
The time complexity of this approach is where and are the lengths of s1 and s2, respectively.
Space complexity#
The space complexity of this approach is where and are the lengths of s1 and s2, respectively.
Distinct Subsequence Pattern Matching
Word Break II